Skip to content

Conversation

@THardy98
Copy link
Contributor

@THardy98 THardy98 commented Oct 15, 2025

What was changed

Added include-retry-scenarios option to throughput_stress scenario.

Setting --option include-retry-scenarios=true adds activities that fail and retry, as actions to throughput_stress.

NOTE: enabling this option increases runtime of a single iteration from ~23s to ~44s!

New activity actions were added to for this fail-and-retry coverage:

  • retryable activity
  • timeout activity
  • heartbeat timeout activity

The added proto definitions are the same. I've separated them because they cover different activity failure cases, which I think would be neater to handle separately in the workers rather than a single failure activity. It also allows them to be expanded separately and more specifically.

I've added these actions to the existing throughput stress scenario, as well as a cancellable activity action.

Added to corresponding logic to each worker to handle these new activity actions.

Why?

Provide additional activity coverage, particularly in cases where the activity does not succeed on the first attempt.

  1. How was this tested:
    Ran the existing throughput_stress_test.go test.
    Ran:
go run ./cmd run-scenario-with-worker \
    --scenario throughput_stress \
    --language go \
    --iterations 10 \
    --option internal-iterations=10 \
    --option include-retry-scenarios=true

Having trouble running throughput stress on other language workers (even before this change). Will address small fixes in subsequent PRs.

  1. Any docs updates needed?
    No

@THardy98 THardy98 force-pushed the tps_activity_failures branch 2 times, most recently from 20765ba to 7e2604d Compare October 15, 2025 16:54
@THardy98 THardy98 marked this pull request as ready for review October 16, 2025 17:04
@THardy98 THardy98 requested a review from a team as a code owner October 16, 2025 17:04
@THardy98
Copy link
Contributor Author

@stephanos @Sushisource 👀

Curious what you think about the timeouts. I'm a little bit wary that they're kind of baked into tps now, which raises the floor run length time. In practice though, probably not an issue given that it's expected to run for extended periods of time

@THardy98 THardy98 changed the title Added actions for retryable activity failures. Added --include-retry-scenarios flag to throughput_stress scenario Oct 18, 2025
@THardy98 THardy98 changed the title Added --include-retry-scenarios flag to throughput_stress scenario Added --include-retry-scenarios option to throughput_stress scenario Oct 18, 2025
@THardy98
Copy link
Contributor Author

Was concerned about the runtime, so decided to put this behind an option --include-retry-scenarios for throughput_stress.

Comment on lines 408 to 414
DelayActivityWithCancellation(1*time.Second, 10*time.Second),
// Test activity retry: fails, succeeds on retry
RetryableErrorActivity(1, RemoteActivityWithRetry(1*time.Second, 2, 500*time.Millisecond, 1.0)),
// Test activity timeout with retry: times out on 1st attempt, succeeds on 2nd
TimeoutActivity(1, RemoteActivityWithRetry(1*time.Second, 2, 500*time.Millisecond, 1.0)),
// Test heartbeat timeout: skips heartbeats on 1st attempt, sends them on 2nd
HeartbeatActivity(1, RemoteActivityWithHeartbeat(10*time.Second, 1*time.Second, 2, 500*time.Millisecond, 1.0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic: If these helpers are only used here I'd probably just inline it. Having helpers for the DSL feels a little over-verbose unless the helper is doing a lot of heavy lifting or re-used a lot (which I believe is the case with the ones already present). That said I can see how it makes this nice too, so, 🤷 . Curious if @stephanos has an opinion.

// Activity that runs too long for N attempts (causing timeout), then completes.
// Tests StartToCloseTimeout behavior with retries.
// On failing attempts, the activity runs until context cancellation (timeout).
// On success, it runs for duration < StartToCloseTimeout (runs for 0.5x StartToClose timeout).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// On success, it runs for duration < StartToCloseTimeout (runs for 0.5x StartToClose timeout).
// On success, it runs for 0.5x StartToClose timeout.

If it always does that, just say that.

That said - I would probably just parameterize the runtime too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fields for success/failure durations


// Activity that skips heartbeats for N attempts (causing heartbeat timeout), then sends them.
// Tests HeartbeatTimeout behavior with retries.
// Duration is derived from activity info's HeartbeatTimeout (runs for 2x heartbeat timeout).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Duration is derived from activity info's HeartbeatTimeout (runs for 2x heartbeat timeout).
// On success, runs for 2x heartbeat timeout.

To be consistent with the other one - but, again I would also probably just parameterize here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@THardy98 THardy98 changed the title Added --include-retry-scenarios option to throughput_stress scenario Added include-retry-scenarios option to throughput_stress scenario Oct 21, 2025
@THardy98 THardy98 requested a review from Sushisource October 21, 2025 22:07
Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me

@THardy98 THardy98 force-pushed the tps_activity_failures branch from 6e9e887 to a378468 Compare October 22, 2025 19:36
@THardy98 THardy98 force-pushed the tps_activity_failures branch from a378468 to 192a528 Compare November 21, 2025 19:03
@THardy98 THardy98 merged commit dbd8ca2 into main Nov 21, 2025
28 checks passed
@THardy98 THardy98 deleted the tps_activity_failures branch November 21, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants